Skip to content

Conversation

@aahrun
Copy link
Contributor

@aahrun aahrun commented Nov 7, 2025

Adding to @ehsanmok and @Ahajha's recent work to try to set up Mac CI.

The run.sh script used the $OSTYPE environment variable to determine whether
the host system is Apple. This environment variable is available in bash, so
might not be a guarantee - and is potentially why CI is failing to evaluate the
host OS as Mac in this branch. Using uname -s is likely a more robust check
locally and in CI.

Ahajha and others added 3 commits November 6, 2025 13:27
The run.sh script used the $OSTYPE environment variable to determine whether
the host system is Apple. This environment variable is available in bash, so
might not be a guarantee - and is potentially why CI is failing on Mac in
this branch. Using uname -s is likely a more robust check locally and in CI.
@aahrun aahrun changed the title adron/add macos gpu runner to ci Fix platform evaluation on macos CI Nov 7, 2025
@aahrun
Copy link
Contributor Author

aahrun commented Nov 7, 2025

$OSTYPE was likely a red-herring and not the reason that the OS check was failing to evaluate. Needs a little more debugging.

Replace use of system_profiler for identification with another
call to uname. It may be that SPDisplaysDataType does not
function as expected on the virtualised github mac runners.
Also we are properly ensuring the host is an arm mac this way.
@aahrun
Copy link
Contributor Author

aahrun commented Nov 7, 2025

Now properly appears to identify the mac system and metal support.

@Ahajha
Copy link
Contributor

Ahajha commented Nov 7, 2025

Interesting, is this an issue with our stack or is it on Github's side?

@ehsanmok
Copy link
Collaborator

ehsanmok commented Nov 7, 2025

Fwiw the gpu-spec script is just a utility script and whether it can find the gpu soec or not is orthogonal to making the ci work.

@ehsanmok
Copy link
Collaborator

ehsanmok commented Nov 7, 2025

All of the attempts show there's something wrong with github mac gpu runners. Locally is fine.

@Ahajha
Copy link
Contributor

Ahajha commented Nov 7, 2025

My remaining theory is that the GH runners are missing some environment variable that we're somehow relying on, but I have no clue what that would be. Not familiar with our platform detection logic.

@aahrun aahrun closed this Nov 7, 2025
@aahrun aahrun reopened this Nov 7, 2025
@aahrun
Copy link
Contributor Author

aahrun commented Nov 7, 2025

Fwiw the gpu-spec script is just a utility script and whether it can find the gpu soec or not is orthogonal to making the ci work.

This may well be true, but I wanted to make sure we could correctly see Metal capability on the host, and fixing the detection allows for us to implement a list of tests that can be skipped since they're not compatible - like the nvidia high compute list.

However, the current errors from the CI say:

open-source/max/mojo/stdlib/stdlib/builtin/constrained.mojo:58:6: note: constraint failed: the target architecture '' is invalid or not currently supported

The missing value for architecture in that error makes me think that we're actually looking at a very similar issue during execution, but I haven't got to that yet :)

@aahrun
Copy link
Contributor Author

aahrun commented Nov 7, 2025

My remaining theory is that the GH runners are missing some environment variable that we're somehow relying on, but I have no clue what that would be. Not familiar with our platform detection logic.

For your interest - we were using system_profiler SPDisplaysDataType in the run.sh platform detection, which works locally but not in CI, maybe some symptom of the way the GPU is virtualised or passed through on these runners. Regardless it wasn't necessary so I replaced that for the initial user facing device info. There's an additional GPU identification script used as well which is python and a bit more robust, I might look at only relying on that in a future change.

@Ahajha
Copy link
Contributor

Ahajha commented Nov 7, 2025

Curious why we need these detection scripts in the first place. Does Mojo not do the right thing:tm: and just detect the GPUs (ignoring the Apple issue, but other GPUs specifically)?

@aahrun
Copy link
Contributor Author

aahrun commented Nov 7, 2025

Curious why we need these detection scripts in the first place. Does Mojo not do the right thing:tm: and just detect the GPUs (ignoring the Apple issue, but other GPUs specifically)?

Some of this is for limiting the tests executed based on detected hardware capability, which I think makes sense especially given that this repo is being used by people who might well be using consumer hardware. But I think some is just legacy and can be optimised.

@ehsanmok
Copy link
Collaborator

ehsanmok commented Nov 7, 2025

The scripts is used initially for users to see specs of their gpu and in the test to detect what puzzles to skip esp. bc the later ones require H100+ and this CI is on T4 so it's not only cleaner but makes things less confusing if users don't have access to our tier 1 gpus.

A temporary mechanism to collect some information about the
hardware from system_profiler to see if anything reports
unusually. I think these CI runners do something a little
differently.
Trying a little in-line Swift to gather information about the GPU
since it's clear from system_profiler that the CPU appears with
a different designation thanks to being virtualised. Perhaps the
GPU does too, and therefore is not recognised properly by info.mojo
Using Swift since system_profiler SPDisplaysDataType doesn't seem
to work for headless machines.
Fixed the capabilities being requested and moved the swift code
to a temporary standalone script file to make things easier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants